Synchronization coherence: A transparent hardware mechanism for cache coherence and fine-grained synchronization

نویسندگان

  • Yao Guo
  • Vladimir Vlassov
  • Raksit Ashok
  • Richard Weiss
  • Csaba Andras Moritz
چکیده

The quest to improve performance forces designers to explore finer-grained multiprocessor machines. Ever increasing chip densities based on CMOS improvements fuel research in highly parallel chip multiprocessors with 100s of processing elements. With such increasing levels of parallelism, synchronization is set to become a major performance bottleneck and efficient support for synchronization an important design criterion. Previous research has shown that integrating support for fine-grained synchronization can have significant performance benefits compared to traditional coarse-grained synchronization. Not much progress has been made in supporting fine-grained synchronization transparently to processor nodes: a key reason perhaps why wide adoption has not followed. In this paper, we propose a novel approach called Synchronization Coherence that can provide transparent finegrained synchronization and caching in a multiprocessor machine and single-chip multiprocessor. Our approach merges fine-grained synchronization mechanisms with traditional cache coherence protocols. It reduces network utilization as well as synchronization related processing overheads while adding minimal hardware complexity as compared to cache coherence mechanisms or previously reported fine-grained synchronization techniques. In addition to its benefit of making synchronization transparent to processor nodes, for the applications studied, it provides up to 23% improvement in performance and up to 24% improvement in energy efficiency with no L2 caches compared to previous fine-grained synchronization techniques. The performance improvement increases up to 38% when simulating with an ideal L2 cache system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Fine Grained Synchronization Support Using Full/Empty Tagged Shared Memory and Cache Coherency

Performance results of machines with fine-grain synchronization on individual lock-free data items (e.g., words), such as the MIT Alewife multiprocessor, illustrate the benefits of supporting fine-grain synchronization. The performance benefits are primarily the result of allowing a dataflow style of computation in programming models, and maximizing the exposed parallelism by minimizing the pos...

متن کامل

Evaluation of Architectural Supports for Fine-grained Synchronization Mechanisms

The advent of multi-/many-core architectures demands efficient run-time supports to sustain parallel applications scalability. Synchronization mechanisms should be optimized in order to account for different scenarios, such as the interaction between threads executed on different cores as well as intra-core synchronization, i.e. involving threads executed on hardware contexts of the same core. ...

متن کامل

Hardware Support for Synchronization in the Scalable Coherent Interface (SCI)

The exploitation of the inherent parallelism in applications depends critically on the eeciency of the synchronization and data exchange primitives provided by the hardware. This paper discusses and analyses such primitives as they are implemented in a pending IEEE standard 1596 for communication in a shared memory multiprocessor, the Scalable Coherent Interface (SCI). The SCI synchronization p...

متن کامل

Adaptive Coherence Batching for Trap-Based Memory Architectures

Both software-initiated and hardware-initiated prefetching have been used to accelerate shared-memory server performance. While software-initiated prefetching require instruction set and compiler support, hardware prefetching often require additional hardware structures or extra memory state. The coherence batching scheme proposed in this paper keeps the system completely binary transparent and...

متن کامل

Update-Based Cache Coherence Protocols for Scalable Shared-Memory Multiprocessors

In this paper, two hardware-controlled update-based cache coherence protocols are presented. The paper discusses the two major disadvantages of the update protocols: inefficiency of updates and the mismatch between the granularity of synchronization and the data transfer. The paper presents two enhancements to the update-based protocols, a write combining scheme and a finer grain synchronizatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Parallel Distrib. Comput.

دوره 68  شماره 

صفحات  -

تاریخ انتشار 2008